Bring back agi::fs::path to ensure UTF-8 paths #231
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See the message for the second commit, also pasted here:
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.
Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).
So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/
As far as I can see, the three options are
This feels risky, might not work on all systems, and could break in
the future.
strings and paths (Yeah, no)
std::filesystem::path by forcing all conversions from and to
std::string to use UTF-8.
So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we just got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.
Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.
The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.
Finally, it should be noted that conversion between
std::filesystem::paths and std::wstrings is broken on gcc <= 11:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048
This is what currently causes the added lagi_mru.add_entry_utf8 test
to fail on the Ubuntu CI. Clang and newer versions of gcc work, though.
Fixes #219.